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Abstract 

We propose a simple model of network co-evolution 
in a game-dynamical system of interacting agents that 
play repeated games with their neighbors, and adapt 
their behaviors and network links based on the outcome 
of those games. The adaptation is achieved through a 
simple reinforcement learning scheme. We show that 
the collective evolution of such a system can be de- 
scribed by appropriately defined replicator dynamics 
equations. In particular, we suggest an appropriate fac- 
torization of the agents' strategies thats results in a cou- 
pled system of equations characterizing the evolution of 
both strategies and network structure, and illustrate the 
framework on two simple examples. 



Introduction 

Many complex systems can be represented as networks 
where nodes correspond to entities and links encode inter- 
dependencies between them. Generally, statistical models 
of networks can be classified into two different approaches. 
In the first approach, networks are modeled via active nodes 
with a given distribution of links, where each node of the 
network represents a dynamical system. In this settings, 
one usually studies problems related to epidemic spread- 
ing, opinion formation, signaling and synchronization and 
so on. In the second approach, which is grounded mainly 
in a graph-theoretical approach, nodes are treated as passive 
elements. Instead, the main focus is on dynamics of link for- 
mation and network growth. Specifically, one is interested 
in algorithmic methods to build graphs formed by passive 
elements (nodes) and links, which evolve according to pre- 
specified, often local rules. This approach produced impor- 
tant results on topological features of social, technological 
and biological networks. 

More recently, however, it has been realized that modeling 
individual and network dynamics separately is too limited to 
capture realistic behavior of networks. Indeed, most real- 
world networks are inherently complex dynamical systems, 
where both attributes of individuals (nodes) and topology of 
the network (links) can have inter-coupled dynamics. For 
instance, it is known that in social networks, nodes tend to 



divide into groups, or communities, of like-minded individ- 
uals. One can ask whether individuals become likeminded 
because they are connected via the network, or whether they 
form network connections because they are like-minded. 
Clearly, the distinction between the two scenarios is not 
clear-cut. Rather, the real world self-organizes by a com- 
bination of the two, the network changing in response to 
opinion and opinion changing in response to the network. 
Recent research has focused on the interplay between at- 
tribute and hnk dynamics (e.g., see ( Gross and Blasius 2008 



Goyal 2005||Perc and Szolnoki 2009[|Castellano, Fortunato, 



and Loreto 2009) for a recent survey of the literature) 



To describe coupled dynamics of individual attributes and 
network topology, here we suggest a simple model of co- 
evolving network that is based on the notion of interact- 
ing adaptive agents. Specifically, we consider network- 
augmented multi-agent systems where agents play repeated 
game with their neighbors, and adapt both their behaviors 
and the network ties depending on the outcome of their in- 
teractions. To adapt, agents use a simple learning mech- 
anism to reinforce (punish) behaviors and network links 
that produce favorable (unfavorable) outcomes. We show 
that the collective evolution of such a system can be de- 
scribed by appropriately defined replicator dynamics equa- 
tions. Originally suggested in the context of evolution- 



ary game theory (e.g., see (Hofbauer and Sigmund 1998 
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[Hofbauer and Sigmund 2003| l), replicator equations have 
been used to model collective learning and adaptation in 
a systems of interacting self-interested agents (Sato and 
ICrutchfield 2003] l. 

Background and Related Work 

One of the oldest and best studied models of a network is the 
Erdos-Renyi random graph defined as G{N;p) where A^ is 
the number of vertices, and p is the probability of a link 
between any two vertices. One of the important topologi- 
cal features of graphs is the degree distribution pk, which 
is the probability that a randomly chosen node has exactly k 
neighbors. In the large N limit, the degree distribution of the 
Erdos-Renyi graph is Poissonian, pk = e^^z'^/fc!, where 
z ~ pN is the average degree, or connectivity. While this 
model adequately describes the percolation transition of the 
real networks, it fails to account for many properties of real- 
word networks such as the Internet, social networks or bio- 



logical networks. In particular, it has been established that 
many real-world network exhibit what is called a scale-free 
phenomenon, where the degree distribution follows a power 
law pk ^ k^^ over a very wide (orders of magnitude) range 
offc. 

To account for the statistical deviations of the observed 
properties of networks from those prescribed by the Erdos- 



Renyi random graph model, Barabasi and Albert (Barabasi 
|and Albert 1999| l proposed a simple model of an evolving 
network, based on an idea of preferential attachment. In 
this model, a network grows by addition of new nodes at 
each time step. Each new node introduced in the system 
chooses to connect preferentially to sites that are akeady 
well connected. Thus, nodes that have higher connectivity 
will add new links with higher rates. It was shown that the 
network produced by this simple process has an asymptotic 
scale-free degree distribution of form pk ^ k^^ . Recent 
variations of the initial preferential attachment model in- 
clude space-inhomogeneous ( Bianconi and Barabasi 2001) 
and time-inhomogeneous generalizations of the preferential 
attachment mechanism (Dor ogovtsev and Mendes 2001|l, 
ageing and redistribution of the existing links (Dorogovtsev, 
|Mendes, and Samukhin 2000), preferential attac hment with 
memory ( Cattuto, Loreto, and Pietronero 2003| l, evolution 



ary generalizations of the preferential attachment ( Poncela 
letaOOOS ), etc. 



(Holme and Newman 2006 1 suggested a model co- 
evolving networks that combines linking with internal node 
dynamics. In their model, each node is assumed to hold one 
of M possible opinions. Initially, the links are randomly 
distributed among the nodes. Then, at each time step, a ran- 
domly chosen node will re-link, with probability 0, one of 
his links to a node that holds the same opinion. And with 
probability 1 — 0, he will change his opinion to agree with 
the opinion of one of his (randomly chosen) neighbor. De- 
spite the simplicity of those rules, the model was shown to 
have a very rich dynamical behavior In particular, while 
varying the parameter (/>, the model undergoes a phase tran- 
sition from a phase in which opinions are diverse to one 
in which most individuals hold the same opinion. ( Skyrms 



and Pemantle 2000) suggested a model of adaptive networks 



where agents are allowed to interact with each other through 
games, and reinforce links positively if the outcome of the 
interaction is favorable to them. They showed that even for 
the simple games, the resulting structural dynamics can be 
very complex. A review of numerous other models can be 



found in a recent survey ( Castellano, Fortunato, and Loreto 
|2009l ). 

In addition to abstract statistical models, recent work 
has addressed the network formation process from the 
perspective of game-theoretical interactions between self- 
interested agents (Bala and Goyal 2000[ Fabrikant et al. 
2003 Anshelevich et al. 2003| l. In these games each agent 



tries to maximize her utihty consisted of two conflicting 
preferences - e.g., minimizing the cost incurred by estab- 
lished edges, and minimizing the distance from all the other 
nodes in the networks. In the simplest scenarios of those 
games, the topology that corresponds to the Nash equilib- 
rium can be obtained by solving a one-shot optimization 



problem. In many situations, however, when the actual cost 
function is more complex, this might not be possible. Fur- 
thermore, in realistic situations, agents might have only local 
information about the network's topology (or utilities of the 
other agents in the network), so maximizing a global utility 
is not an option. In this case, the agents can arrive at Nash 
equilibrium by dynamically adjusting their strategies. 

Dynamics for Co-Evolving Networks 

Let us consider a set of agents that play repeated games with 
each other Each round of the game proceeds as follows: 
First, an agent has to choose what other agent he wants to 
play with. Then, assuming that the other agent has agreed to 
play, the agent has to choose an appropriate action from the 
pool of available actions. Thus, to define an overall game 
strategy, we have to specify how an agent chooses a partner 
for the game and a particular action. 

For the sake of simplicity, let us start with three agents, 
which is the minimum number required for a non-trivial dy- 
namics. Let us differentiate those agents by indices x, y, 
and z. Here we will focus on the case when the number 
of actions available to agents is finite. The time-dependent 
mixed strategies of agents can be characterized by a proba- 
bility distribution over the choice of the neighbors and the 
actions. For instance, p%y{t) is the probability that the agent 
X will choose to play with agent y and perform action i at 
time t. 

Furthermore, we assume that the agent adapt to their 
environment through a simple reinforcement mechanism. 
Among different reinforcement schemes, here we focus on 
(stateless) (^-learning (Watkins and Dayan 1992). Within 
this scheme, the agents' strategies are parameterized through 
so called Q-functions that characterize relative utility of a 
particular strategy. After each round of game, the Q func- 
tions are updated according to the following rule: 



Qlv{t + ^) 
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where R\. is the expected reward of agent x for playing 
action i with agent y, and a is a parameter that determines 
the learning rate (which can be set to a = 1 without a loss 
of generality). 

Next, we have to specify how agents choose a particular 
neighbor and an action based on their (J-function. Here we 
use the Boltzmann exploration mechanism where the prob- 
ability of a particular choice is given as ( Sutton and Barto 
[T998l l 

Q' 
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Here the inverse temperature =1/T > controls explo- 
ration/exploitation tradeoff: for T — > the agent always 
choose the action corresponding to the maximum Q-value, 
while for T — !• oo the agents' choices are completely ran- 
dom. 

We now assume that the agents interact with each other 
many times between two consecutive updates of their strate- 
gies. In this case, the reward of the i-th agent in Equa- 
tion[T]should be understood in terms of the average reward. 



where the average is taken over the strategies of other agents, 
Rl-y = X]j ^xyPlx' where A^'^ is the reward (payoff) of 
agent x playing strategy i against the agent y who plays 
strategy j. Note that generally speaking, the payoff might 
be asymmetric. 

We are interested in the continuous approximation to the 
learning dynamics. Thus, we replace t+1— 7>t + (5i, a— > 
aSt, and take the limit (5t — > in ([T]i to obtain 
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(3) 



Differentiating l2] using Eqs. l2] l3] and scaling the time 
i — >^ at we obtain the following replicator equation (Sato 
land Cnitchfield 2003| l: 
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Equations l4] describes the collective adaptation of the Q- 
learning agents through repeated game-dynamical interac- 
tions. The first two terms indicate that a probability of a 
playing a particular pure strategy increases with a rate pro- 
portional to the overall goodness of that strategy, which 
mimics fitness-based selection mechanism in population bi- 
ology ( Hofbauer and Sigmund 1998 ). The second term, 
which has an entropic meaning, does not have a direct ana- 
logue in population biology ( Sato and Crutchfield 2003) . 
This term is due to the Boltzmann selection mechanism, and 
thus, describes the agents' tendency to randomize over their 
strategies. Note that for T = this term disappears and the 
equations reduce to the conventional replicator system ( Hof- 
|bauer and Sigmund 1998[ ). 

So far, our discussion has been very general. We now 
make the assumption that the agents strategies can be factor- 
ized as follows: 
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Here c^y is the probability that the agent x will initiate a 
game with the agent y, whereas p^ is the probability that 
he will choose action i. Thus, the assumption behind this 
factorization is that the probability that the agent will per- 
form action i does not depend on whom the game is played 
against. 

To proceed further, we substitutel5]in|4] take a summation 
of both sides in the above equation once over y and then over 
i, and make use of the normalization conditions in Eq.lSlto 
obtain the following system: 
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Equations l6] and IT] are the replicator equations that de- 
scribe the collective and mutual evolution of the agent strate- 
gies and the network structure, by taking into account ex- 
plicit coupling between the strategies and link wights. Our 
preliminary analysis suggest that this co-evolutionary sys- 
tem can demonstrate a very rich behavior even for simple 
games. Below we illustrate the framework on two simple 
examples. 

Examples 

Our preliminary results indicate that the co-evolutionary 
system Equations|6]and[7]can have a very rich behavior even 
for simple games. The full analysis of those equations will 
be reported elsewhere. Here we consider two simple ex- 
amples, and focus on the link dynamics (Eqs. 17]), assuming 
that the agents play Nash equilibrium strategies of the cor- 
responding two-agent game. 

Our first example is a simple coordination game with the 
following (two-agent) payoff matrix: 



A = 



1 




Thus, agents get a unit reward if they jointly take the first 
action, and get no reward otherwise. 

We assume that the agents always play the same action 
(e.g., pI. = p]^ = p\ = 1) yielding a reward A^^ = 1, 
so we can focus on the link dynamics. Then the equations 
characterizing the link dynamics are as follows: 
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We note that the system allows different rest-points some 
of which correspond to pure Nash equilibria (NE). For in- 
stance, one such configuration is Cxy = 1 — Cyz — 1, 
while c^x can have arbitrary value. In this NE configura- 
tion agents x and y always play against each other while 
agent z is isolated. In addition, there is an interior rest point 
at Cxy = Cyz — Czx = 1/2, which is again a NE config- 
uration. A simple analysis yields that this symmetric rest 
point is unstable if the temperature is below a certain critical 
value. This can be shown by linearizing the system around 
the symmetric rest point and obtaining the following Jaco- 
bian: 
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It is straightforward to show that for AT > 1 all three eigen- 
values of this Jacobian become negative, thus making the 
interior rest point stable. The dynamic of links for various 
temperature is shown in Figure [T] 

As a second example, we consider the Rock-Paper- 
Scissor (RPS) normal game which has the following payoff 
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describing the evolution of the links are as follows: 
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Figure 1: Dynamics of links for various temperatures. 
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where — 1 < e < 1. This game has a mixed Nash equilib- 
rium where all the strategies are played with equal proba- 
bilities 1/3. Note that RPS game can have a very rich and 
interesting dynamics even for two players. For instance, it 
has been noted that for a two-person RPS game at T = the 
dynamical system might show a chaotic behavior at certain 
range of e and never reach an equilibrium (Sato, Akiyama, 
and Farmer 2002 1. Again, here we focus on the link dy- 



namics by assuming that the agents play the NE-prescribed 
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This system has a number of different rest points. For in- 
stance, for —1 < e < and T = 0, the stable rest point cor- 
responds to a directed triangle with no reciprocated links, 

Cxy — Cyz — Czx — -L or Cxy — Cyz — Czx — U . 

This is expected, since for — 1 < e < 0, the average re- 
ward is negative, so the agents are better off not playing 
with each other at all. There is also an interior rest point 
at Cxy = Cyz = Czx = \- As in the previous example, it 
can be shown that there is a critical value of T below which 
this rest point is unstable. Indeed, the Jacobian around the 
symmetric rest point is as follows: 
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A simple calculation shows that the interior rest point be- 
comes stable whenever T > e/12 for 1 > e > 0, and 

T> |e|/6for-l < e< 0. 

Discussion 

In conclusion, we have presented a replicator-dynamics 
based framework for studying mutual evolution network 
topology and agent behavior in a network-augmented sys- 
tem of interacting adaptive agents. By assuming that the 
agents strategies allow appropriate factorization, we derived 
a system of a coupled replicator equations that describe the 
mutual evolution of agent behavior and network link struc- 
ture. The examples analyzed here were for simplified sce- 
narios. As a future work, we plan to perform a more through 
analysis of the dynamics for the fully coupled system. Fur- 
thermore, we intend to go beyond the three-agent systems 
considered here and examine larger systems. Finally, we 
note that the main premise behind our model is that the 
strategies can be factorized according to Equations l5] While 
this assumption is justified for certain type of games, this 
might not be the case for some other games, where factor- 
ization can destroy some NE points that exist in the non- 
factorized strategy spaces. We intend to examine this ques- 
tion more thoroughly in our future work. 
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